Search CORE

64 research outputs found

Modeling context words as regions: an ordinal regression approach to word embedding

Author: Jameel Shoaib
Schockaert Steven
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2017
Field of study

Vector representations of word meaning have found many applications in the field of natural language processing. Word vectors intuitively represent the average context in which a given word tends to occur, but they cannot explicitly model the diversity of these contexts. Although region representations of word meaning offer a natural alternative to word vectors, only few methods have been proposed that can effectively learn word regions. In this paper, we propose a new word embedding model which is based on SVM regression. We show that the underlying ranking interpretation of word contexts is sufficient to match, and sometimes outperform, the performance of popular methods such as Skip-gram. Furthermore, we show that by using a quadratic kernel, we can effectively learn word regions, which outperform existing unsupervised models for the task of hypernym detection

Online Research @ Cardiff

Word and Document Embedding with vMF-Mixture Priors on Context Word Vectors

Author: Jameel Shoaib
Schockaert Steven
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

Word embedding models typically learn two types of vectors: target word vectors and context word vectors. These vectors are normally learned such that they are predictive of some word co-occurrence statistic, but they are otherwise unconstrained. However, the words from a given language can be organized in various natural groupings, such as syntactic word classes (e.g. nouns, adjectives, verbs) and semantic themes (e.g. sports, politics, sentiment). Our hypothesis in this paper is that embedding models can be improved by explicitly imposing a cluster structure on the set of context word vectors. To this end, our model relies on the assumption that context word vectors are drawn from a mixture of von Mises- Fisher (vMF) distributions, where the parameters of this mixture distribution are jointly optimized with the word vectors. We show that this results in word vectors which are qualitatively different from those obtained with existing word embedding models. We furthermore show that our embedding model can also be used to learn high-quality document representations

Crossref

Online Research @ Cardiff

Kent Academic Repository

A unified posterior regularized topic model with maximum margin for learning-to-rank

Author: Bing Lidong
Jameel Shoaib
Lam Wai
Schockaert Steven
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

While most methods for learning-to-rank documents only consider relevance scores as features, better results can often be obtained by taking into account the latent topic structure of the document collection. Existing approaches that consider latent topics follow a two-stage approach, in which topics are discovered in an unsupervised way, as usual, and then used as features for the learning-to-rank task. In contrast, we propose a learning-to-rank framework which integrates the supervised learning of a maximum margin classifier with the discovery of a suitable probabilistic topic model. In this way, the labelled data that is available for the learning-to-rank task can be exploited to identify the most appropriate topics. To this end, we use a unified constrained optimization framework, which can dynamically compute the latent topic similarity score between the query and the document. Our experimental results show a consistent improvement over the state-of-the-art learning-to-rank models

CiteSeerX

Online Research @ Cardiff

Word and document embedding with vMF-mixture priors on context word vectors

Author: Jameel Shoaib
Schockaert Steven
Publication venue
Publication date: 01/01/2019
Field of study

Online Research @ Cardiff

Web Query Reformulation via Joint Modeling of Latent Topic Dependency and Term Context

Author: Bing Lidong
Jameel Shoaib
Lam Wai
Wong Tak-Lam
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 15/02/2015
Field of study

An important way to improve users’ satisfaction in Web search is to assist them by issuing more effective queries. One such approach is query reformulation, which generates new queries according to the current query issued by users. A common procedure for conducting reformulation is to generate some candidate queries first, then a scoring method is employed to assess these candidates. Currently, most of the existing methods are context based. They rely heavily on the context relation of terms in the history queries and cannot detect and maintain the semantic consistency of queries. In this article, we propose a graphical model to score queries. The proposed model exploits a latent topic space, which is automatically derived from the query log, to detect semantic dependency of terms in a query and dependency among topics. Meanwhile, the graphical model also captures the term context in the history query by skip-bigram and n-gram language models. In addition, our model can be easily extended to consider users’ history search interests when we conduct query reformulation for different users. In the task of candidate query generation, we investigate a social tagging data resource—Delicious bookmark—to generate addition and substitution patterns that are employed as supplements to the patterns generated from query log data

University of Essex Research Repository

Kent Academic Repository